AITopics | synthetic video

Collaborating Authors

synthetic video

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

23f3a0f82d79d985b6076bc84d14f66b-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-9-2026, 12:47:40 GMT

arxiv preprint arxiv, dataset, video, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.68)
Media > Television (0.32)
Media > Photography (0.32)
Media > Film (0.32)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Zero-shot Synthetic Video Realism Enhancement via Structure-aware Denoising

Wang, Yifan, Ji, Liya, Ke, Zhanghan, Yang, Harry, Lim, Ser-Nam, Chen, Qifeng

arXiv.org Artificial IntelligenceNov-19-2025

We propose an approach to enhancing synthetic video realism, which can re-render synthetic videos from a simulator in photorealistic fashion. Our realism enhancement approach is a zero-shot framework that focuses on preserving the multi-level structures from synthetic videos into the enhanced one in both spatial and temporal domains, built upon a diffusion video foundational model without further fine-tuning. Specifically, we incorporate an effective modification to have the generation/denoising process conditioned on estimated structure-aware information from the synthetic video, such as depth maps, semantic maps, and edge maps, by an auxiliary model, rather than extracting the information from a simulator. This guidance ensures that the enhanced videos are consistent with the original synthetic video at both the structural and semantic levels. Our approach is a simple yet general and powerful approach to enhancing synthetic video realism: we show that our approach outperforms existing baselines in structural consistency with the original video while maintaining state-of-the-art photorealism quality in our experiments.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.14719

Genre: Research Report > New Finding (0.48)

Industry: Transportation > Ground > Road (0.30)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AURA: Development and Validation of an Augmented Unplanned Removal Alert System using Synthetic ICU Videos

Seo, Junhyuk, Moon, Hyeyoon, Jung, Kyu-Hwan, Oh, Namkee, Kim, Taerim

arXiv.org Artificial IntelligenceNov-18-2025

Unplanned extubation (UE)--the unintended removal of an airway tube--remains a critical patient safety concern in intensive care units (ICUs), often leading to severe complications or death. Real-time UE detection has been limited, largely due to the ethical and privacy challenges of obtaining annotated ICU video data. We propose Augmented Unplanned Removal Alert (AURA), a vision-based risk detection system developed and validated entirely on a fully synthetic video dataset. By leveraging text-to-video diffusion, we generated diverse and clinically realistic ICU scenarios capturing a range of patient behaviors and care contexts. The system applies pose estimation to identify two high-risk movement patterns: collision, defined as hand entry into spatial zones near airway tubes, and agitation, quantified by the velocity of tracked anatomical keypoints. Expert assessments confirmed the realism of the synthetic data, and performance evaluations showed high accuracy for collision detection and moderate performance for agitation recognition. This work demonstrates a novel pathway for developing privacy-preserving, reproducible patient safety monitoring systems with potential for deployment in intensive care settings.

artificial intelligence, machine learning, video, (19 more...)

arXiv.org Artificial Intelligence

2511.12241

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Generative deep learning for foundational video translation in ultrasound

Bhatnagar, Nikolina Tomic Roshni, Jain, Sarthak, Lau, Connor, Liu, Tien-Yu, Gambini, Laura, Arnaout, Rima

arXiv.org Artificial IntelligenceNov-6-2025

Department of Medicine, Division of Cardiology Bakar Computational Health Sciences Institute UCSF - UC Berkeley Joint Program in Computational Precision Health Department of Radiology, Center for Intelligent Imaging University of California, San Francisco Corresponding Author Keywords: medical imaging, video translation, deep learning, image synthesis, ultrasound Word Count: 4129 Abstract Deep learning (DL) has the potential to revolutionize image acquisition and interpretation across medicine, h owever, attention to data imbalance and missin gness is required . U ltrasound data presents a particular challenge because in addition to different views and structures, it includes several sub - modalities -- such as greyscale and color flow doppler (CFD) -- that are often imbalanced in clinical studies . Image translation can help balance datasets but is challenging for ultrasound sub - modalities to date . Here, we present a generative method for ultrasound CFD - greyscale video translation, t rained on 5 4, 975 videos and tested on 8, 3 68 . The method developed leveraged pixel - wise, adversarial, and perceptual loses and utilized two networks: one for reconstructing anatomic structures and one for denoising to achieve realistic ultrasound imaging . A verage pairwise SSIM between synthetic videos and ground truth was 0.9 1 0.0 4 . Synthetic videos performed indistinguishably from real ones in DL classification and segmentation tasks and when evaluated by b linded clinical experts: F1 score was 0.9 for real and 0.89 for synthetic videos; Dice score between real and synthetic segmentation was 0.97. Overall c linician accuracy in distinguishing real vs synthetic videos was 54 6% (42 - 61%), indicating reali stic synthetic videos . Although trained only on heart videos, the model worked well on ultrasound spanning several clinical domains (av erage SSIM 0.91 0.0 5), demonstrating foundational abilit ies .

artificial intelligence, machine learning, video, (18 more...)

arXiv.org Artificial Intelligence

2511.03255

Country: North America > United States > California > San Francisco County > San Francisco (0.68)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding

Li, Zongxia, Wu, Xiyang, Shi, Guangyao, Qin, Yubin, Du, Hongyang, Liu, Fuxiao, Zhou, Tianyi, Manocha, Dinesh, Boyd-Graber, Jordan Lee

arXiv.org Artificial IntelligenceOct-28-2025

Vision-Language Models (VLMs) have achieved strong results in video understanding, yet a key question remains: do they truly comprehend visual content or only learn shallow correlations between vision and language? Real visual understanding, especially of physics and common sense, is essential for AI systems that interact with the physical world. Current evaluations mostly use real-world videos similar to training data, so high benchmark scores may not reflect real reasoning ability. To address this, we propose negative-control tests using videos that depict physically impossible or logically inconsistent events. We introduce VideoHallu, a synthetic dataset of physics- and commonsense-violating scenes generated with Veo2, Sora, and Kling. It includes expert-annotated question-answer pairs across four categories of violations. Tests of leading VLMs (Qwen-2.5-VL, Video-R1, VideoChat-R1) show that, despite strong results on benchmarks such as MVBench and MMVU, they often miss these violations, exposing gaps in visual reasoning. Reinforcement learning fine-tuning on VideoHallu improves recognition of such violations without reducing standard benchmark performance. Our data is available at https://github.com/zli12321/VideoHallu.git.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.01481

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

23f3a0f82d79d985b6076bc84d14f66b-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 21:03:23 GMT

arxiv preprint arxiv, dataset, video, (13 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (0.68)
Media > Television (0.32)
Media > Photography (0.32)
Media > Film (0.32)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Echo-Path: Pathology-Conditioned Echo Video Generation

Muhammad, Kabir Hamzah, Elbatel, Marawan, Qin, Yi, Li, Xiaomeng

arXiv.org Artificial IntelligenceSep-23-2025

Cardiovascular diseases (CVDs) remain the leading cause of mortality globally, and echocardiography is critical for diagnosis of both common and congenital cardiac conditions. However, echocardiographic data for certain pathologies are scarce, hindering the development of robust automated diagnosis models. In this work, we propose Echo-Path, a novel generative framework to produce echocardiogram videos conditioned on specific cardiac pathologies. Echo-Path can synthesize realistic ultrasound video sequences that exhibit targeted abnormalities, focusing here on atrial septal defect (ASD) and pulmonary arterial hypertension (PAH). Our approach introduces a pathology-conditioning mechanism into a state-of-the-art echo video generator, allowing the model to learn and control disease-specific structural and motion patterns in the heart. Quantitative evaluation demonstrates that the synthetic videos achieve low distribution distances, indicating high visual fidelity. Clinically, the generated echoes exhibit plausible pathology markers. Furthermore, classifiers trained on our synthetic data generalize well to real data and, when used to augment real training sets, it improves downstream diagnosis of ASD and PAH by 7% and 8% respectively. Code, weights and dataset are available here.

artificial intelligence, machine learning, video, (17 more...)

arXiv.org Artificial Intelligence

2509.1719

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

AEGIS: Authenticity Evaluation Benchmark for AI-Generated Video Sequences

Li, Jieyu, Zhang, Xin, Zhou, Joey Tianyi

arXiv.org Artificial IntelligenceAug-15-2025

Recent advances in AI-generated content have fueled the rise of highly realistic synthetic videos, posing severe risks to societal trust and digital integrity. Existing benchmarks for video authenticity detection typically suffer from limited realism, insufficient scale, and inadequate complexity, failing to effectively evaluate modern vision-language models against sophisticated forgeries. To address this critical gap, we introduce AEGIS, a novel large-scale benchmark explicitly targeting the detection of hyper-realistic and semantically nuanced AI-generated videos. AEGIS comprises over 10,000 rigorously curated real and synthetic videos generated by diverse, state-of-the-art generative models, including Stable Video Diffusion, CogVideoX-5B, KLing, and Sora, encompassing open-source and proprietary architectures. In particular, AEGIS features specially constructed challenging subsets enhanced with robustness evaluation. Furthermore, we provide multimodal annotations spanning Semantic-Authenticity Descriptions, Motion Features, and Low-level Visual Features, facilitating authenticity detection and supporting downstream tasks such as multimodal fusion and forgery localization. Extensive experiments using advanced vision-language models demonstrate limited detection capabilities on the most challenging subsets of AEGIS, highlighting the dataset's unique complexity and realism beyond the current generalization capabilities of existing models. In essence, AEGIS establishes an indispensable evaluation benchmark, fundamentally advancing research toward developing genuinely robust, reliable, broadly generalizable video authenticity detection methodologies capable of addressing real-world forgery threats. Our dataset is available on https://huggingface.co/datasets/Clarifiedfish/AEGIS.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746027.3758295

2508.10771

Country: Asia > Singapore (0.15)

Genre:

Research Report (1.00)
Overview (0.68)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

'Universal' detector spots AI deepfake videos with record accuracy

New ScientistAug-4-2025, 12:00:32 GMT

A universal deepfake detector has achieved the best accuracy yet in spotting multiple types of videos manipulated or completely generated by artificial intelligence. The technology may help flag non-consensual AI-generated pornography, deepfake scams or election misinformation videos. The widespread availability of cheap AI-powered deepfake creation tools has fuelled the out-of-control online spread of synthetic videos. Many depict women – including celebrities and even schoolgirls – in nonconsensual pornography. And deepfakes have also been used to influence political elections, as well as to enhance financial scams targeting both ordinary consumers and company executives. But most AI models trained to detect synthetic video focus on faces – which means they are most effective at spotting one specific type of deepfake, where a real person's face is swapped into an existing video.

artificial intelligence, machine learning, video, (11 more...)

New Scientist

Country:

North America > United States > California > Riverside County > Riverside (0.07)
North America > United States > Tennessee > Davidson County > Nashville (0.06)
North America > United States > New York (0.06)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment > Games > Computer Games (0.32)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GV-VAD : Exploring Video Generation for Weakly-Supervised Video Anomaly Detection

Cai, Suhang, Peng, Xiaohao, Wang, Chong, Cai, Xiaojie, Qian, Jiangbo

arXiv.org Artificial IntelligenceAug-4-2025

Video anomaly detection (VAD) plays a critical role in public safety applications such as intelligent surveillance. However, the rarity, unpredictability, and high annotation cost of real-world anomalies make it difficult to scale VAD datasets, which limits the performance and generalization ability of existing models. To address this challenge, we propose a generative video-enhanced weakly-supervised video anomaly detection (GV-VAD) framework that leverages text-conditioned video generation models to produce semantically controllable and physically plausible synthetic videos. These virtual videos are used to augment training data at low cost. In addition, a synthetic sample loss scaling strategy is utilized to control the influence of generated synthetic samples for efficient training. The experiments show that the proposed framework outperforms state-of-the-art methods on UCF-Crime datasets. The code is available at https://github.com/Sumutan/GV-VAD.git.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.00312

Country: Asia > China (0.15)

Genre: Research Report (0.70)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback